MoEC: Mixture of Expert Clusters

نویسندگان

چکیده

Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE models convert dense layers into sparse experts, and utilize a gated routing network make experts conditionally activated. However, as the number grows, outrageous parameters suffers from overfitting data allocation. Such problems are especially severe on tasks limited data, thus hindering progress towards improving performance by up. We verify that there exists upper bound up MoE. In this work, we propose Expert Clusters — general approach enable expert learn more diverse appropriate knowledge imposing variance-based constraints stage. Given this, could further cluster-level dropout strategy specifically designed for cluster structure. Our experiments reveal MoEC improve machine translation natural language understanding tasks. plays positive role in mitigating allocation problems, fully releasing potential large-scale models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Mixture Model for Expert Finding

This paper addresses the issue of identifying persons with expertise knowledge on a given topic. Traditional methods usually estimate the relevance between the query and the support documents of candidate experts using, for example, a language model. However, the language model lacks the ability of identifying semantic knowledge, thus results in some right experts cannot be found due to not occ...

متن کامل

Mixture of Expert Agents for Handling Imbalanced Data Sets

Many real-world data sets exhibit skewed class distributions in which almost all cases are allotted to a class and far fewer cases to a smaller, usually more interesting class. A classifier induced from an imbalanced data set has, typically, a low error rate for the majority class and an unacceptable error rate for the minority class. This paper firstly provides a systematic study on the variou...

متن کامل

On linear mixture of expert approaches to information retrieval

Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document coll...

متن کامل

Simultaneous Feature and Expert Selection within Mixture of Experts

A useful strategy to deal with complex classification scenarios is the “divide and conquer” approach. The mixture of experts (MOE) technique makes use of this strategy by joinly training a set of classifiers, or experts, that are specialized in different regions of the input space. A global model, or gate function, complements the experts by learning a function that weights their relevance in d...

متن کامل

Forecasting Using a Mixture of Local Expert Models

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26617